G Uide a Ctor - C Ritic for C Ontinuous C Ontrol

نویسندگان

  • Abbas Abdolmaleki
  • Masashi Sugiyama
چکیده

Actor-critic methods solve reinforcement learning problems by updating a parameterized policy known as an actor in a direction that increases an estimate of the expected return known as a critic. However, existing actor-critic methods only use values or gradients of the critic to update the policy parameter. In this paper, we propose a novel actor-critic method called the guide actor-critic (GAC). GAC firstly learns a guide actor that locally maximizes the critic and then it updates the policy parameter based on the guide actor by supervised learning. Our main theoretical contributions are two folds. First, we show that GAC updates the guide actor by performing second-order optimization in the action space where the curvature matrix is based on the Hessians of the critic. Second, we show that the deterministic policy gradient method is a special case of GAC when the Hessians are ignored. Through experiments, we show that our method is a promising reinforcement learning method for continuous controls.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Einforcement L Earning through a Syn - Chronous a Dvantage a Ctor - C Ritic on a Gpu

We introduce a hybrid CPU/GPU version of the Asynchronous Advantage ActorCritic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. We analyze its computational traits and concentrate on aspects critical to leveraging the GPU’s computational power. We introduce a system of queues and a dynamic scheduling strategy, potentially helpful for ot...

متن کامل

A Direct Control Method For a Class of Nonlinear Systems Using Neural Networks

g…ihGpEsxpixqG„‚FTS g—m˜ridge …niversity ingineering hep—rtment „rumpington ƒtreet g—m˜ridge gfP I€ ingl—nd w—r™h IWWI e dire™t ™ontrol s™heme for — ™l—ss of ™ontinuous time nonline—r systems using neuE r—l networks is presentedF „he o˜je™tive of ™ontrol is to tr—™k — desired referen™e sign—lF „his o˜je™tive is —™hieved through inputGoutput line—riz—tion of the system with neur—l networksF „he...

متن کامل

Improving Communication of Critical Domain Knowledge in High-Consequence Software Development: An Empirical Study

K. S. H a n k s ; U n iv e rs ity o f V irg in ia ; C h a rlo tte s v ille , V irg in ia J. C. K n ig h t; U n iv e rs ity o f V irg in ia ; C h a rlo tte s v ille , V irg in ia K e y w o rd s : re q u ire m e n ts , n a tu ra l la n g u a g e , s a fe ty-c ritic a l A b s tra c t P o o r re q u ire m e n ts a re im p lic a te d in a d is p ro p o rtio n a te n u m b e r o f d e fe c ts in s a ...

متن کامل

Ää Blockinøùöö Aeóøø× Ò Óñôùøøö Ë Blockin Blockin Blockinò Blockin

1 Nonlinear stabilization by hybrid quantized feedba k Daniel Liberzon Dept. of Ele t. Eng., Yale University New Haven, CT 06520-8267 U.S.A. daniel.liberzon yale.edu Abstra t. This paper is on erned with global asymptoti stabilization of ontinuous-time ontrol systems by means of quantized feedba k. For linear systems, a hybrid ontrol strategy for dealing with this problem was re ently proposed ...

متن کامل

G-frames and their duals for Hilbert C*-modules

Abstract. Certain facts about frames and generalized frames (g- frames) are extended for the g-frames for Hilbert C*-modules. It is shown that g-frames for Hilbert C*-modules share several useful properties with those for Hilbert spaces. The paper also character- izes the operators which preserve the class of g-frames for Hilbert C*-modules. Moreover, a necessary and suffcient condition is ob- ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018